Bimodal Corpora Terminology Extraction: Another Brick in the Wall

نویسندگان

  • Claudiu Mihaila
  • Dalila Mekhaldi
چکیده

This paper presents a new study on automatic terminology extraction in the context of bimodal corpora that were generated from lectures and meetings. More specifically, the study aims to observe to which extent written text (discussed documents) and spoken text (dialogue transcript) share keywords. Using a hybrid terminology extraction approach, experiments have been performed on a collection of bimodal English corpora, including one scientific conference presentations corpus and two decision-making meetings corpora respectively. The evaluation results highlight a difference between keywords extracted from written text and from spoken text. Moreover, the obtained results emphasise the importance of considering text obtained from different modalities in order to generate rich and consistent keyword lists for bimodal corpora.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Terminology Extraction from Comparable Corpora for Latvian

This paper presents the work on terminology extraction from comparable corpora for Latvian. In the first section we introduce our work; the second section briefly describes the concept of the project and the implemented general terminology processing chain; the following two sections focus on terminology extraction workflow for Latvian and evaluation of results, respectively.

متن کامل

Word Co-occurrence Counts Prediction for Bilingual Terminology Extraction from Comparable Corpora

Methods dealing with bilingual lexicon extraction from comparable corpora are often based on word co-occurrence observation and are by essence more effective when using large corpora. In most cases, specialized comparable corpora are of small size, and this particularity has a direct impact on bilingual terminology extraction results. In order to overcome insufficient data coverage and to make ...

متن کامل

TBXTools: A Free, Fast and Flexible Tool for Automatic Terminology Extraction

The manual identification of terminology from specialized corpora is a complex task that needs to be addressed by flexible tools, in order to facilitate the construction of multilingual terminologies which are the main resources for computer-assisted translation tools, machine translation or ontologies. The automatic terminology extraction tools developed so far either use a proprietary code or...

متن کامل

Parallel Corpora, Terminology Extraction und Machine Translation

In this paper we first give an overview of parallel corpus annotation, alignment and retrieval. We present standard annotation methods such as Part-of-Speech tagging, lemmatization and dependency parsing, but we also introduce language-specific methods, e.g. for dealing with split verbs or truncated compounds in German. We argue for careful sentence and word alignment for parallel corpora. And ...

متن کامل

Looking at Unbalanced Specialized Comparable Corpora for Bilingual Lexicon Extraction

The main work in bilingual lexicon extraction from comparable corpora is based on the implicit hypothesis that corpora are balanced. However, the historical contextbased projection method dedicated to this task is relatively insensitive to the sizes of each part of the comparable corpus. Within this context, we have carried out a study on the influence of unbalanced specialized comparable corpo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009